## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
readxl::read_excel("dataset/dataset-variable-description.xlsx") |>
DT::datatable()HR Annalytics Employee Attrition and Performance.
BCon 147: special topics
1 Project overiew
In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.
2 Scenario
Imagine you are working as a data analyst for a mid-sized company that is experiencing high employee turnover, especially among high-performing employees. The company has been facing increased costs related to hiring and training new employees, and management is concerned about the negative impact on productivity and morale. The human resources (HR) team has collected historical employee data and now looks to you for actionable insights. They want to understand why employees are leaving and how to retain talent effectively.
Your task is to analyze the dataset and provide insights that will help HR prioritize retention strategies. These strategies could include interventions like revising compensation policies, improving job satisfaction, or focusing on work-life balance initiatives. The success of your analysis could lead to significant cost savings for the company and an increase in employee engagement and performance.
3 Understanding data source
The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. The dataset is particularly useful for exploring how factors such as job satisfaction, work-life balance, and training opportunities influence employee performance and attrition.
This dataset is well-suited for conducting in-depth analysis of employee performance and retention, enabling us to build predictive models that identify the key drivers of employee attrition. Additionally, we can assess the impact of various organizational factors, such as training and work-life balance, on both performance and retention outcomes.
4 Data wrangling and management
Libraries
Before we start working on the dataset, we need to load the necessary libraries that will be used for data wrangling, analysis and visualization. Make sure to load the following libraries here. For packages to be installed, you can use the install.packages function. There are packages to be installed later on this project, so make sure to install them as needed and load them here.
# load all your libraries here
library(readr)
library(readxl)
library(haven)
library(tidyverse)
library(dplyr)
library(skimr)
library(ggplot2)
library(janitor)
library(tidytext)
library(lubridate)
library(DT)
library(magrittr)4.1 Data importation
Import the two dataset
Employee.csvandPerformanceRating.csv. Save theEmployee.csvasemployee_dtaandPerformanceRating.csvasperf_rating_dta.Merge the two dataset using the
left_joinfunction fromdplyr. Use theEmployeeIDvariable as the varible to join by. You may read more information about theleft_joinfunction here.Save the merged dataset as
hr_perf_dtaand display the dataset using thedatatablefunction fromDTpackage.
#Working directory
## import the two data here
employee_dta <- read.csv("C:/Users/Administrator/Desktop/midterm-bcon147-project-exercise/dataset/Employee.csv")
perf_rating_dta <- read.csv("C:/Users/Administrator/Desktop/midterm-bcon147-project-exercise/dataset/PerformanceRating.csv")
## Use the datatable from DT package to display the merged dataset
## merge employee_dta and perf_rating_dta using left_join function.
merged_data <- left_join(employee_dta, perf_rating_dta, by = "EmployeeID")
## save the merged dataset as hr_perf_dta
hr_perf_dta <- merged_data
## Use the datatable from DT package to display the merged dataset
datatable(hr_perf_dta)4.2 Data management
Using the
clean_namesfunction fromjanitorpackage, standardize the variable names by using the recommended naming of variables.Save the renamed variables as
hr_perf_dtato update the dataset.
## clean names using the janitor packages and save as hr_perf_dta
hr_perf_dta <- hr_perf_dta %>% clean_names()
## display the renamed hr_perf_dta using datatable function
datatable(hr_perf_dta)Create a new variable
cat_educationwhereineducationis1=No formal education;2=High school;3=Bachelor;4=Masters;5=Doctorate. Use thecase_whenfunction to accomplish this task.Similarly, create new variables
cat_envi_sat,cat_job_sat, andcat_relation_satforenvironment_satisfaction,job_satisfaction, andrelationship_satisfaction, respectively. Re-code the values accordingly as1=Very dissatisfied;2=Dissatisfied;3=Neutral;4=Satisfied; and5=Very satisfied.Create new variables
cat_work_life_balance,cat_self_rating,cat_manager_ratingforwork_life_balance,self_rating, andmanager_rating, respectively. Re-code accordingly as1=Unacceptable;2=Needs improvement;3=Meets expectation;4=Exceeds expectation; and5=Above and beyond.Create a new variable
bi_attritionby transformingattritionvariable as a numeric variabe. Re-code accordingly asNo=0, andYes=1.Save all the changes in the
hr_perf_dta. Note that saving the changes with the same name will update the dataset with the new variables created.
## create cat_education
colnames(hr_perf_dta) [1] "employee_id" "first_name"
[3] "last_name" "gender"
[5] "age" "business_travel"
[7] "department" "distance_from_home_km"
[9] "state" "ethnicity"
[11] "education" "education_field"
[13] "job_role" "marital_status"
[15] "salary" "stock_option_level"
[17] "over_time" "hire_date"
[19] "attrition" "years_at_company"
[21] "years_in_most_recent_role" "years_since_last_promotion"
[23] "years_with_curr_manager" "performance_id"
[25] "review_date" "environment_satisfaction"
[27] "job_satisfaction" "relationship_satisfaction"
[29] "training_opportunities_within_year" "training_opportunities_taken"
[31] "work_life_balance" "self_rating"
[33] "manager_rating"
hr_perf_dta <- hr_perf_dta %>% mutate(cat_education = case_when(education == 1 ~ "No formal education", education == 2 ~ "High school", education == 3 ~ "Bachelor", education == 4 ~ "Masters", education == 5 ~ "Doctorate",TRUE ~ NA_character_ ))
## create cat_envi_sat, cat_job_sat, and cat_relation_sat
hr_perf_dta <- hr_perf_dta %>% mutate(cat_envi_sat = case_when(
environment_satisfaction == 1 ~ "Very dissatisfied",
environment_satisfaction == 2 ~ "Dissatisfied",
environment_satisfaction == 3 ~ "Neutral",
environment_satisfaction == 4 ~ "Satisfied",
environment_satisfaction == 5 ~ "Very satisfied",
TRUE ~ NA_character_
)) %>%
# Recode job satisfaction
mutate(cat_job_sat = case_when(
job_satisfaction == 1 ~ "Very dissatisfied",
job_satisfaction == 2 ~ "Dissatisfied",
job_satisfaction == 3 ~ "Neutral",
job_satisfaction == 4 ~ "Satisfied",
job_satisfaction == 5 ~ "Very satisfied",
TRUE ~ NA_character_
)) %>%
# Recode relationship satisfaction
mutate(cat_relation_sat = case_when(
relationship_satisfaction == 1 ~ "Very dissatisfied",
relationship_satisfaction == 2 ~ "Dissatisfied",
relationship_satisfaction == 3 ~ "Neutral",
relationship_satisfaction == 4 ~ "Satisfied",
relationship_satisfaction == 5 ~ "Very satisfied",
TRUE ~ NA_character_))
datatable(hr_perf_dta)## create cat_work_life_balance, cat_self_rating, and cat_manager_rating
hr_perf_dta <- hr_perf_dta %>% mutate(cat_work_life_balance = case_when(
work_life_balance == 1 ~ "Unacceptable",
work_life_balance == 2 ~ "Needs improvement",
work_life_balance == 3 ~ "Meets expectation",
work_life_balance == 4 ~ "Exceeds expectation",
work_life_balance == 5 ~ "Above and beyond",
TRUE ~ NA_character_
)) %>%
# Recode self-rating
mutate(cat_self_rating = case_when(
self_rating == 1 ~ "Unacceptable",
self_rating == 2 ~ "Needs improvement",
self_rating == 3 ~ "Meets expectation",
self_rating == 4 ~ "Exceeds expectation",
self_rating == 5 ~ "Above and beyond",
TRUE ~ NA_character_
)) %>%
# Recode manager rating
mutate(cat_manager_rating = case_when(
manager_rating == 1 ~ "Unacceptable",
manager_rating == 2 ~ "Needs improvement",
manager_rating == 3 ~ "Meets expectation",
manager_rating == 4 ~ "Exceeds expectation",
manager_rating == 5 ~ "Above and beyond",
TRUE ~ NA_character_
))
datatable(hr_perf_dta)## create bi_attrition
hr_perf_dta <- hr_perf_dta %>%
mutate(bi_attrition = if_else(attrition == "Yes", 1, 0))
datatable(hr_perf_dta)## print the updated hr_perf_dta using datatable function
datatable(hr_perf_dta)5 Exploratory data analysis
5.1 Descriptive statistics of employee attrition
Select the variables
attrition,job_role,department,age,salary,job_satisfaction, andwork_life_balance.Save asattrition_key_var_dta.Compute and plot the attrition rate across
job_role,department, andage,salary,job_satisfaction, andwork_life_balance. To compute for the attrition rate, group the dataset by job role. Afterward, you can use thecountfunction to get the frequency of attrition for each job role and then divide it by the total number of observations. Save the computation aspct_attrition. Do not forget to ungroup before storing the output. Store the output asattrition_rate_job_role.Plot for the attrition rate across
job_rolehas been done for you! Study each line of code. You have the freedom to customize your plot accordingly. Show your creativity!
if(!require(dplyr)) install.packages("dplyr"); library(dplyr)
## selecting attrition key variables and save as `attrition_key_var_dta`
attrition_key_var_dta <- hr_perf_dta %>%
select(attrition, job_role, department, age, salary, job_satisfaction, work_life_balance)
# Compute attrition rate by job_role
attrition_rate_job_role <- employee_dta %>%
group_by(JobRole) %>%
summarise(
total_employees = n(),
total_attrition = sum(Attrition == "Yes", na.rm = TRUE)
) %>%
mutate(pct_attrition = total_attrition / total_employees * 100) %>%
ungroup()
# Ungroup the dataset
# Print the attrition_rate_job_role
datatable(attrition_rate_job_role)# Attrition Rate by Department
attrition_rate_department <- employee_dta %>%
group_by(Department) %>%
summarise(
total_employees = n(),
total_attrition = sum(Attrition == "Yes", na.rm = TRUE)
) %>%
mutate(pct_attrition = total_attrition / total_employees * 100) %>%
ungroup()
# Print Attrition Rate Department
datatable(attrition_rate_department)# Step 3: Compute attrition rate by Age Group
attrition_rate_age <- employee_dta %>%
mutate(age_group = cut(Age, breaks = c(20, 30, 40, 50, 60), labels = c("20-30", "31-40", "41-50", "51-60"))) %>%
group_by(age_group) %>%
summarise(
total_employees = n(),
total_attrition = sum(Attrition == "Yes", na.rm = TRUE)
) %>%
mutate(pct_attrition = total_attrition / total_employees * 100) %>%
ungroup()
#Print Attrition Rate Age
datatable(attrition_rate_age)# Step 4: Compute attrition rate by Salary
attrition_rate_salary <- employee_dta %>%
group_by(Salary) %>%
summarise(
total_employees = n(),
total_attrition = sum(Attrition == "Yes", na.rm = TRUE)
) %>%
mutate(pct_attrition = total_attrition / total_employees * 100) %>%
ungroup()
# Print Attrition Rate Salary
datatable(attrition_rate_salary)# Step 5: Compute attrition rate by Job Satisfaction
attrition_rate_satisfaction <- hr_perf_dta %>%
group_by(job_satisfaction) %>%
summarise(
total_employees = n(),
total_attrition = sum(attrition == "Yes", na.rm = TRUE)
) %>%
mutate(pct_attrition = total_attrition / total_employees * 100) %>%
ungroup()
#Print Attrition Rate Satisfaction
datatable(attrition_rate_satisfaction)#Compute attrition rate by Work Life Balance
attrition_rate_work_life <- hr_perf_dta %>%
group_by(work_life_balance) %>%
summarise(
total_employees = n(),
total_attrition = sum(attrition == "Yes", na.rm = TRUE)
) %>%
mutate(pct_attrition = total_attrition / total_employees * 100) %>%
ungroup()
#Print Attrition Rate Work Life Balance
datatable(attrition_rate_work_life)## print attrition_rate_job_role
print(attrition_rate_job_role)# A tibble: 13 × 4
JobRole total_employees total_attrition pct_attrition
<chr> <int> <int> <dbl>
1 Analytics Manager 52 3 5.77
2 Data Scientist 261 62 23.8
3 Engineering Manager 75 2 2.67
4 HR Business Partner 7 0 0
5 HR Executive 28 3 10.7
6 HR Manager 4 0 0
7 Machine Learning Engineer 146 10 6.85
8 Manager 37 2 5.41
9 Recruiter 24 9 37.5
10 Sales Executive 327 57 17.4
11 Sales Representative 83 33 39.8
12 Senior Software Engineer 132 9 6.82
13 Software Engineer 294 47 16.0
# Load libraries
if (!require("ggplot2")) install.packages("ggplot2")
library(ggplot2)
# Plot attrition rate by Job Role with a diverging gradient fill
ggplot(attrition_rate_job_role, aes(x = JobRole, y = pct_attrition, fill = pct_attrition)) +
geom_bar(stat = "identity") +
labs(title = "Attrition Rate by Job Role", x = "Job Role", y = "Attrition Rate (%)") +
theme_light() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_gradient2(low = "cyan", mid = "pink", high = "violet", midpoint = median(attrition_rate_job_role$pct_attrition))# Plot attrition rate by Department
ggplot(attrition_rate_department, aes(x = Department, y = pct_attrition, fill = Department)) +
geom_bar(stat = "identity") +
labs(title = "Attrition Rate by Department", x = "Department", y = "Attrition Rate (%)") +
theme_classic() +
scale_fill_brewer(palette = "Dark2") +
theme(
plot.background = element_rect(fill = "#f5cac3"), # Change to your desired background color
panel.background = element_rect(fill = "#ffffff"), # Change to your desired panel background color
legend.background = element_rect(fill = "lightgray") # Optional: change legend background color
)# Plot attrition rate by Age Group
ggplot(attrition_rate_age, aes(x = age_group, y = pct_attrition, fill = age_group)) +
geom_bar(stat = "identity") +
labs(title = "Attrition Rate by Age Group", x = "Age Group", y = "Attrition Rate (%)") +
theme_minimal() +
scale_fill_brewer(palette = "Accent") +
theme(
plot.background = element_rect(fill = "#ccd5ae"), # Customize background colo
panel.background = element_rect(fill = "#e9edc9"), # Panel background color
legend.background = element_rect(fill = "#fefae0") # Legend background color
)# Plot attrition rate by Salary
ggplot(attrition_rate_salary, aes(x = Salary, y = pct_attrition)) +
geom_line(color = "#22223b") + # Customize line color here
labs(title = "Attrition Rate by Salary", x = "Salary", y = "Attrition Rate (%)") +
theme_minimal() +
theme(
plot.background = element_rect(fill = "white"),
panel.background = element_rect(fill = "lightgray") #
)# Plot attrition rate by Job Satisfaction
ggplot(attrition_rate_satisfaction, aes(x = job_satisfaction, y = pct_attrition)) +
geom_bar(stat = "identity", fill = "#3CCBAE") +
labs(title = "Attrition Rate by Job Satisfaction", x = "Job Satisfaction", y = "Attrition Rate (%)") +
theme_minimal()# Plot attrition rate by Work Life Balance
ggplot(attrition_rate_work_life, aes(x = work_life_balance, y = pct_attrition, fill = work_life_balance)) +
geom_bar(stat = "identity") +
labs(title = "Attrition Rate by Work Life Balance", x = "Work Life Balance", y = "Attrition Rate (%)") +
theme_minimal()5.2 Identifying attrition key drivers using correlation analysis
Conduct a correlation analysis of key variables:
bi_attrition,salary,years_at_company,job_satisfaction,manager_rating, andwork_life_balance. Use thecor()function to run the correlation analysis. Remove missing values using thena.omit()before running the correlation analysis. Save the output inhr_corr.Use a correlation matrix or heatmap to visualize the relationship between these variables and attrition. You can use the
GGallypackage and use theggcorrfunction to visualize the correlation heatmap. You may explore this site for more information: ggcorr.Discuss which factors seem most correlated with attrition and what that suggests aobut why employees are leaving.
## conduct correlation of key variables.
hr_perf_dta <- na.omit(hr_perf_dta[, c("bi_attrition", "salary", "years_at_company", "job_satisfaction", "manager_rating", "work_life_balance")])
## print hr_corr
hr_corr <- cor(hr_perf_dta)
print(hr_corr) bi_attrition salary years_at_company job_satisfaction
bi_attrition 1.000000000 -0.211181478 -0.6896527798 0.0132368129
salary -0.211181478 1.000000000 0.2206442116 0.0053054850
years_at_company -0.689652780 0.220644212 1.0000000000 0.0008700583
job_satisfaction 0.013236813 0.005305485 0.0008700583 1.0000000000
manager_rating -0.007654429 -0.001596736 0.0178656879 -0.0158205481
work_life_balance 0.003428836 -0.001517145 0.0079339508 0.0417242942
manager_rating work_life_balance
bi_attrition -0.007654429 0.003428836
salary -0.001596736 -0.001517145
years_at_company 0.017865688 0.007933951
job_satisfaction -0.015820548 0.041724294
manager_rating 1.000000000 0.007996938
work_life_balance 0.007996938 1.000000000
## install GGally package and use ggcorr function to visualize the correlation
if (!require("GGally")) install.packages("GGally")
if (!require("ggplot2")) install.packages("ggplot2")
library(GGally)
library(ggplot2)
# Create the correlation matrix plot
correlation_plot <- ggcorr(
hr_perf_dta,
label = TRUE,
label_round = 2,
label_size = 3,
nbreaks = NULL,
palette = "RdPu",
name = "Correlation",
layout.exp = 2,
hjust = 1,
size = 3
) +
scale_fill_gradient2(
low = "#FFE6E6",
mid = "#FF69B4",
high = "#4B0082",
midpoint = 0,
space = "Lab",
guide = "colourbar"
) +
ggtitle("Correlation Matrix of Key Variables with Attrition") +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(
hjust = 0.5,
face = "bold",
size = 11,
margin = margin(t = 0, b = 20),
color = "black"
),
plot.title.position = "plot",
axis.text.x = element_text(
angle = 45,
vjust = 1,
hjust = 0.5,
size = 8
),
axis.text.y = element_text(
size = 10,
hjust = 1,
margin = margin(r = 10, l = 10)
),
legend.position = "right",
legend.title = element_text(face = "bold"),
legend.text = element_text(size = 9),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.margin = margin(t = 20, r = 20, b = 20, l = 20, unit = "pt"),
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA)
)
print(correlation_plot)Based on my analysis of the correlation matrix concerning employee attrition, several significant insights can be drawn:
Salary exhibits the strongest correlation with attrition, recorded at -0.69. This indicates that lower salaries are associated with higher rates of employee turnover, highlighting the critical role of financial compensation in employee retention.
Job Satisfaction demonstrates a moderate correlation of -0.21. This suggests that employees who experience lower job satisfaction are more likely to leave the organization, reinforcing the importance of a positive work environment.
Years at Company also correlates at -0.21, indicating that employees with shorter tenures are slightly more likely to depart. This trend may reflect the lack of established connections or commitment to the organization among newer employees.
Manager Rating shows a very weak correlation of -0.01, suggesting that the evaluations of managers have minimal impact on employee attrition.
Work-Life Balance reveals no correlation with attrition, indicating that this factor may not significantly influence employees’ decisions to remain with or leave the organization.
Overall, it is evident that compensation serves as the primary driver of attrition, with job satisfaction also playing a notable role. Furthermore, newer employees appear to be at a heightened risk of turnover.
To effectively reduce attrition, it is recommended that organizations focus on:
- Reviewing and improving salary structures.
- Enhancing job satisfaction levels.
- Developing targeted retention strategies for newer employees.
In conclusion, addressing financial factors and overall job experience should be prioritized to improve employee retention.
5.3 Predictive modeling for attrition
- Create a logistic regression model to predict employee attrition using the following variables:
salary,years_at_company,job_satisfaction,manager_rating, andwork_life_balance. Save the model ashr_attrition_glm_model. Print the summary of the model using thesummaryfunction.
Install the
sjPlotpackage and use thetab_modelfunction to display the summary of the model. You may read the documentation here on how to customize your model summary.Also, use the
plot_modelfunction to visualize the model coefficients. You may read the documentation here on how to customize your model visualization.Discuss the results of the logistic regression model and what they suggest about the factors that contribute to employee attrition.
## run a logistic regression model to predict employee attrition
## save the model as hr_attrition_glm_model
hr_attrition_glm_model <- glm(
bi_attrition ~ salary + years_at_company + job_satisfaction + manager_rating + work_life_balance,
data = hr_perf_dta,
family = binomial
)
## print the summary of the model using the summary function
summary(hr_attrition_glm_model)
Call:
glm(formula = bi_attrition ~ salary + years_at_company + job_satisfaction +
manager_rating + work_life_balance, family = binomial, data = hr_perf_dta)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.571e+00 2.173e-01 11.831 <2e-16 ***
salary -3.633e-06 4.086e-07 -8.893 <2e-16 ***
years_at_company -6.333e-01 1.476e-02 -42.919 <2e-16 ***
job_satisfaction 3.470e-02 3.186e-02 1.089 0.276
manager_rating 5.071e-03 3.810e-02 0.133 0.894
work_life_balance 2.587e-02 3.198e-02 0.809 0.419
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8574.5 on 6708 degrees of freedom
Residual deviance: 4781.6 on 6703 degrees of freedom
AIC: 4793.6
Number of Fisher Scoring iterations: 5
## install sjPlot package and use tab_model function to display the summary of the model
if(!require(sjPlot)) install.packages("sjPlot"); library(sjPlot)
tab_model(hr_attrition_glm_model, show.ci = TRUE, show.p = TRUE, show.se = TRUE, title = "Logistic Regression Model for Attrition")| bi attrition | ||||
| Predictors | Odds Ratios | std. Error | CI | p |
| (Intercept) | 13.08 | 2.84 | 0.00 – Inf | <0.001 |
| salary | 1.00 | 0.00 | 0.00 – Inf | <0.001 |
| years at company | 0.53 | 0.01 | 0.00 – Inf | <0.001 |
| job satisfaction | 1.04 | 0.03 | 0.00 – Inf | 0.276 |
| manager rating | 1.01 | 0.04 | 0.00 – Inf | 0.894 |
| work life balance | 1.03 | 0.03 | 0.00 – Inf | 0.419 |
| Observations | 6709 | |||
| R2 Tjur | 0.502 | |||
# Use plot_model function to visualize the model coefficients
p <- plot_model(hr_attrition_glm_model,
show.values = TRUE,
value.size = 3,
title = "Coefficients of Attrition Logistic Regression Model",
title.size = 12,
axis.title.size = 12,
axis.text.size = 11,
vline.color = "#f4a261",
dot.size = 4,
line.size = 1.5,
grid = TRUE,
grid.color = "#2a9d8f",
colors = "#e76f51",
value.offset = 0.3
)
p + theme(
plot.background = element_rect(fill = "lightblue"),
plot.title = element_text(color = "#264653",
size = 11,
face = "bold"),
panel.background = element_rect(fill = "#faedcd"),
panel.grid.major = element_line(color = "white"),
panel.grid.minor = element_blank()
)Salary
The odds ratio for salary is 1.00 (p < 0.001), which means salary doesn’t really affect whether employees leave. While the p-value suggests some relationship, the odds ratio being close to 1.00 indicates that salary changes don’t strongly predict if someone will quit.
Years at Company
This factor has an odds ratio of 0.53 (p < 0.001), showing that for each additional year an employee stays, their chances of leaving go down by 47%. This is a significant finding, showing that longer employment reduces the likelihood of quitting.
Job Satisfaction
The odds ratio for job satisfaction is 1.04 (p = 0.276), which means a slight increase in job satisfaction raises the odds of leaving by 4%. However, this result isn’t statistically significant, suggesting job satisfaction doesn’t play a big role in whether employees stay or go.
Manager Rating
With an odds ratio of 1.01 (p = 0.894), manager ratings have little impact on attrition. The high p-value shows that manager ratings aren’t significant predictors of whether employees will leave.
Work-Life Balance
The odds ratio for work-life balance is 1.03 (p = 0.419), meaning a small increase in work-life balance raises the odds of leaving by 3%. Again, this result isn’t significant, indicating that work-life balance doesn’t strongly influence employee retention.
Compare the average monthly income of employees who left the company (
bi_attrition = 1) and those who stayed (bi_attrition = 0). Use thet.testfunction to conduct a t-test and determine if there is a significant difference in average monthly income between the two groups. Save the results in a variable calledattrition_ttest_results.Install the
reportpackage and use thereportfunction to generate a report of the t-test results.Install the
ggstatsplotpackage and use theggbetweenstatsfunction to visualize the distribution of monthly income for employees who left and those who stayed. Make sure to map thebi_attritionvariable to thexargument and thesalaryvariable to theyargument.Visualize the
salaryvariable for employees who left and those who stayed usinggeom_histogramwithgeom_freqpoly. Make sure to facet the plot by thebi_attritionvariable and applyalphaon the histogram plot.Provide recommendations on whether revising compensation policies could be an effective retention strategy.
## compare the average monthly income of employees who left and those who stayed
attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)
## print the results of the t-test
print(attrition_ttest_results)
Welch Two Sample t-test
data: salary by bi_attrition
t = 19.074, df = 5557.5, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
39387.67 48411.52
sample estimates:
mean in group 0 mean in group 1
125856.35 81956.76
## install the report package and use the report function to generate a report of the t-test results
if(!require(report)) install.packages("report"); library(report)
attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)
report_ttest <- report(attrition_ttest_results)
# Print the report
report_ttestEffect sizes were labelled following Cohen's (1988) recommendations.
The Welch Two Sample t-test testing the difference of salary by bi_attrition
(mean in group 0 = 1.26e+05, mean in group 1 = 81956.76) suggests that the
effect is positive, statistically significant, and medium (difference =
43899.59, 95% CI [39387.67, 48411.52], t(5557.53) = 19.07, p < .001; Cohen's d
= 0.51, 95% CI [0.46, 0.57])
# install ggstatsplot package and use ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed
library(ggstatsplot)
#Use ggbetweenstats to create the plot
ggbetweenstats(
data = hr_perf_dta,
x = bi_attrition,
y = salary,
xlab = "Attrition (0 = Stayed, 1 = Left)",
ylab = "Monthly Income",
title = "Distribution of Monthly Income for Employees Who Left vs Stayed",
ggtheme = ggplot2::theme_minimal()
)library(ggplot2)
# Create histogram and frequency polygon of salary for employees who left and those who stayed
ggplot(hr_perf_dta, aes(x = salary)) +
geom_histogram(aes(y = ..density..),
binwidth = 5000,
fill = "#b56576",
color = "#e56b6f",
alpha = 0.4) +
geom_freqpoly(aes(y = ..density..),
binwidth = 5000,
color = "#6d597a",
size = 1) +
facet_wrap(~ bi_attrition,
labeller = as_labeller(c(`0` = "Stayed", `1` = "Left"))) +
labs(title = "Salary Distribution of Employees Who Stayed vs. Left",
x = "Monthly Salary",
y = "Density") +
theme_minimal() +
theme(
plot.background = element_rect(fill = "#eaac8b"),
plot.title = element_text(color = "#264653",
size = 11,
face = "bold"),
panel.background = element_rect(fill = "#faedcd"),
panel.grid.major = element_line(color = "white"),
panel.grid.minor = element_blank()
)Based on the salary distribution visualizations presented, there is a clear correlation between compensation levels and employee retention rates.
Analysis of the data reveals that the median income for retained employees (approximately $125,000) is significantly higher than those who departed (approximately $82,000). The distribution patterns indicate that employees who remained with the organization demonstrate a wider salary range extending into higher compensation brackets, while those who left are predominantly clustered in lower salary ranges.
Regarding the effectiveness of compensation policy revision as a retention strategy, the data suggests this would be a viable approach for several reasons:
The substantial disparity in median salaries between retained and departing employees indicates that compensation level is a significant factor in retention decisions.
The concentration of departures in lower salary ranges suggests that addressing compensation in these brackets could yield improved retention outcomes.
The salary distribution of retained employees provides a benchmark for potentially effective compensation levels.
Therefore, implementing revised compensation policies, particularly focusing on employees in lower salary brackets, could serve as an effective retention strategy. The data supports that employees receiving higher compensation demonstrate greater likelihood of remaining with the organization.
It is recommended that the organization consider structured salary adjustments targeting the identified at-risk compensation ranges to potentially improve retention rates.
5.4 Employee satisfaction and performance analysis
Analyze the average performance ratings (both
ManagerRatingandSelfRating) of employees who left vs. those who stayed. Use thegroup_byandcountfunctions to calculate the average performance ratings for each group.Visualize the distribution of
SelfRatingfor employees who left and those who stayed using a bar plot. Use theggplotfunction to create the plot and map theSelfRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Similarly, visualize the distribution of
ManagerRatingfor employees who left and those who stayed using a bar plot. Make sure to map theManagerRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Create a boxplot of
salarybyjob_satisfactionandbi_attritionto analyze the relationship between salary, job satisfaction, and attrition. Use thegeom_boxplotfunction to create the plot and map thesalaryvariable to thexargument, thejob_satisfactionvariable to theyargument, and thebi_attritionvariable to thefillargument. You need to transform thejob_satisfactionandbi_attritionvariables into factors before creating the plot or within theggplotfunction.Discuss the results of the analysis and provide recommendations for HR interventions based on the findings.
# Check if the column 'self_rating' exists in the data
if ("self_rating" %in% names(hr_perf_dta)) {
avg_ratings <- hr_perf_dta %>%
group_by(bi_attrition) %>%
summarise(
avg_manager_rating = mean(manager_rating, na.rm = TRUE),
avg_self_rating = mean(self_rating, na.rm = TRUE),
count_employees = n()
)
} else {
avg_ratings <- hr_perf_dta %>%
group_by(bi_attrition) %>%
summarise(
avg_manager_rating = mean(manager_rating, na.rm = TRUE),
count_employees = n()
)
}# Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot.
# Load required library# Visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot.
ggplot(hr_perf_dta, aes(x = manager_rating, fill = as.factor(bi_attrition))) +
geom_bar(position = "dodge") +
labs(
title = "Distribution of Manager Rating for Employees Who Stayed vs Left",
x = "Manager Rating",
y = "Count",
fill = "Attrition (0 = Stayed, 1 = Left)"
) + scale_fill_manual(values = c("#cbdfbd", "#f19c79")) +
theme_minimal()# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.
ggplot(hr_perf_dta, aes(x = factor(job_satisfaction), y = salary, fill = factor(bi_attrition))) +
geom_boxplot() +
labs(
title = "Salary Distribution by Job Satisfaction and Attrition Status",
x = "Job Satisfaction",
y = "Salary",
fill = "Attrition (0 = Stayed, 1 = Left)"
) +
scale_fill_manual(values = c("#452137", "#bc8a7e")) +
theme_minimal() Looking at the average performance ratings of employees who left compared to those who stayed reveals important reasons for attrition. Employees with lower Manager Ratings and Self Ratings are more likely to leave, suggesting they might be unhappy with their managers and their own performance. The bar plot shows that those who left often rated themselves lower. Additionally, the box plot of salaries and job satisfaction indicates that lower pay might lead to higher turnover among less satisfied workers.
To address these issues, HR should improve manager training to boost employee engagement, encourage regular self-assessments, and ensure salaries are competitive through salary reviews. Enhancing job satisfaction can also be achieved by offering flexible work options, opportunities for professional development, and employee recognition programs. Regular employee surveys and thorough exit interviews can provide insights into why employees leave and highlight areas for improvement. By adopting these strategies, the organization can create a more engaged workforce and reduce turnover.
5.5 Work-life balance and retention strategies
At this point, you are already well aware of the dataset and the possible factors that contribute to employee attrition. Using your R skills, accomplish the following tasks:
Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed.
work_life_balance_summary <- hr_perf_dta %>% group_by(bi_attrition, work_life_balance) %>% summarise(count = n(), .groups = "drop") print(work_life_balance_summary)# A tibble: 10 × 3 bi_attrition work_life_balance count <dbl> <int> <int> 1 0 1 84 2 0 2 1134 3 0 3 1090 4 0 4 1146 5 0 5 994 6 1 1 37 7 1 2 568 8 1 3 580 9 1 4 560 10 1 5 516Use visualizations to show the differences.
library(ggplot2) # Create the bar plot for WorkLifeBalance ggplot(hr_perf_dta, aes(x = factor(work_life_balance), fill = factor(bi_attrition))) + geom_bar(position = "dodge", color = "black", size = 0.5) + geom_text(stat = 'count', aes(label = ..count..), position = position_dodge(0.9), vjust = -0.5, size = 4) + labs( title = "Employee Work-Life Balance: Comparing Those Who Stayed vs. Left", x = "Work-Life Balance Rating", y = "Number of Employees", fill = "Attrition Status\n(0 = Stayed, 1 = Left)" ) + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, size = 18, face = "bold"), axis.title = element_text(size = 14), legend.position = "top", legend.title = element_text(face = "italic"), panel.grid.major = element_line(color = "lightgrey"), panel.grid.minor = element_blank() ) + scale_fill_manual(values = c("#f49097", "#dfb2f4")) + scale_x_discrete(labels = c("1" = "Poor", "2" = "Fair", "3" = "Good", "4" = "Very Good", "5" = "Excellent"))Assess whether employees with poor work-life balance are more likely to leave.
# Compute attrition rate by WorkLifeBalance attrition_rate_wlb <- hr_perf_dta %>% group_by(work_life_balance) %>% summarise( total_employees = n(), total_attrition = sum(bi_attrition == 1), attrition_rate = (total_attrition / total_employees) * 100 ) # Print the attrition rate summary print(attrition_rate_wlb)# A tibble: 5 × 4 work_life_balance total_employees total_attrition attrition_rate <int> <int> <int> <dbl> 1 1 121 37 30.6 2 2 1702 568 33.4 3 3 1670 580 34.7 4 4 1706 560 32.8 5 5 1510 516 34.2
library(ggplot2)
# Visualize the attrition rate by WorkLifeBalance
ggplot(attrition_rate_wlb, aes(x = factor(work_life_balance), y = attrition_rate)) +
geom_col(fill = "#f28482") +
labs(
title = "Attrition Rate by Work-Life Balance Rating",
x = "Work-Life Balance Rating",
y = "Attrition Rate (%)"
) +
theme_minimal()You have the freedom how you will accomplish this task. Be creative and provide insights that will help HR develop effective retention strategies.
The analysis of work-life balance ratings among employees who left compared to those who stayed reveals notable trends. Visualizations, such as the boxplot and density plot, illustrate that employees with lower work-life balance ratings are more frequently represented among those who left the company. This suggests a pattern where poor work-life balance may be associated with a higher likelihood of attrition.
Statistical tests further support this observation. The chi-square test indicates a statistically significant relationship between work-life balance and employee attrition, while the logistic regression model provides quantitative evidence, suggesting that employees with lower work-life balance ratings have a higher probability of leaving. Together, these findings highlight work-life balance as a potential driver of employee turnover.
Given this correlation, HR strategies that focus on enhancing work-life balance could play a crucial role in improving retention. Initiatives such as flexible work arrangements, wellness programs, and options for remote work could directly address employees’ work-life needs. By proactively identifying and supporting employees with low work-life balance ratings, the organization can cultivate a more supportive work environment, potentially reducing attrition and fostering a more engaged and committed workforce.
5.6 Recommendations for HR interventions
Based on the analysis conducted, provide recommendations for HR interventions that could help reduce employee attrition and improve overall employee satisfaction and performance. You may use the following question as guide for your recommendations and discussions.
What are the key factors contributing to employee attrition in the company?
Answer :The analysis reveals that several key issues contribute to employees leaving the company. First, many employees report low job satisfaction, meaning they do not feel fulfilled or happy in their roles. Second, poor ratings of their managers indicate that employees may feel unsupported or undervalued by their supervisors. Third, a lack of work-life balance makes it difficult for employees to manage their personal and professional lives, leading to increased stress. Additionally, many employees feel that their pay is not competitive with what others in the market are earning, which further drives their decision to seek employment elsewhere.
Which factors are most strongly correlated with attrition?
Answer :The correlation analysis shows that job satisfaction and salary are the strongest factors related to employee attrition. Employees who report feeling unhappy in their jobs or believe they are not paid enough are more likely to leave the company. Additionally, the ratings given by managers also play a significant role. Employees who receive low ratings from their managers may feel unsupported or undervalued, which can lead them to look for jobs elsewhere. In summary, low job satisfaction, insufficient pay, and poor manager support are key reasons why employees decide to leave the company.
What strategies could be implemented to improve employee retention and satisfaction?
Answer:To improve employee retention and satisfaction, HR should implement several key strategies. First, providing training for managers can enhance their leadership and communication skills, leading to better support for their teams. It’s also important to regularly check market salaries to ensure that pay remains competitive, which can help reduce dissatisfaction. Offering flexible work options, such as remote work and adjustable hours, can help employees achieve a better work-life balance. Additionally, collecting employee feedback through surveys and focus groups can help identify areas for improvement. Finally, creating opportunities for career advancement through training and mentorship shows employees that the company values their growth. Together, these strategies can foster a more positive work environment and encourage employees to stay with the company.
How can HR leverage the insights from the analysis to develop effective retention strategies?
Answer:HR can use insights from the analysis to focus on initiatives that directly address the main reasons employees leave, like job satisfaction and how well managers support their teams. By using data, HR can create solutions that meet the actual needs of employees. This could involve improving management training and enhancing the workplace culture. When HR understands the specific issues affecting employees, they can develop better solutions that lead to a happier and more engaged workforce. Focusing on these employee concerns can help reduce turnover and create a positive work environment.
What are the potential benefits of implementing these strategies for the company?
Answer:Implementing these strategies can bring numerous benefits to the organization. First and foremost, reducing turnover rates will save the company money on recruitment and training costs. When employees feel supported and satisfied, they tend to be more engaged and productive, which directly impacts overall performance. A happier workforce also helps create a positive company culture, making the organization more attractive to potential employees. This improved reputation can help the company draw in and keep top talent. In the long run, prioritizing employee satisfaction builds loyalty, lowers attrition rates, and supports the organization’s success. Overall, investing in employees is not just a good practice; it’s essential for thriving in today’s competitive job market.